Improving Text Classification Performance with the Analysis of Lexical Dependencies and Class-based Feature Selection

نویسندگان

  • Levent Özgür
  • Fikret S. Gürgen
چکیده

Akın for his valuable comments during the thesis defense and also his trust on me for offering a part-time instructorship position in the department during my studies. I had a long and confusing journey for this thesis in the last seven years. Unfortunately , not being an idealist full-time PhD candidate researcher all the time, PhD dissertation seemed to be the main motivation that seriously affected all my critical life decisions in this period. During this long time, I had the chance and motivation to experience many related occupations in parallel-I worked full time in three companies (including a very tough entrepreneurship experience, Do˘ ga Teknoloji), lectured in two different universities and participated in a research project besides my PhD studies. There are a serious number of valuable mates who somehow involved in some parts of this long journey-some of them even changed the way seriously. Physical limits of this section is surely not adequate to explicitly mention all of these valuable people. In general, I want to thank all my work-mates in the former companies (Lipman, Do˘ ga Teknoloji and TIMw.e.) and universities (Bo˘ gaziçi University and Okan University), lab-mates (AILab), roommates (ETA14) and more generally all CmpE-mates in the department, all my former and current house-mates in Akıncı Palace (!) and, of course, all my beloved buddies. This thesis would not be completed in its way without the existence of the following core group from different associations-I would try to mention them with their most specific contributions during the journey: Arzucan¨Ozgür (we, together, shaped iv the very critical introductive steps of the thesis), Murat Kayıhan (he was my beloved manager in Lipman during the lecture period of PhD, who permitted me very valuable times for PhD classes and research), Mehmet Gürmen (my beloved ex-partner who supported my PhD challenges during the proposal and early progress period in the impayable entrepreneurship experience: Doga Teknoloji and SeyYar), Prof. Ahmet Ka¸slı (as the department head in Okan University, he showed great empathy, easiness and support to allocate very critical time for the thesis), Aslı Uyar¨Ozkaya (most of the administrative and related decisions with also some critical technical issues in the long run of the thesis have a sign of her) and Prof. Nadir Yücel (his colorful life-worth experiences, priceless life advices and wisdom characteristics have a very strong impact on my thinking-I believe, he will completely recover from his recent illness …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

Improving Chernoff criterion for classification by using the filled function

Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...

متن کامل

Feature selection using genetic algorithm for classification of schizophrenia using fMRI data

In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010